A Method for Improving Automatic Word Categorization

نویسنده

  • Emin Erkan Korkmaz
چکیده

A METHOD FOR IMPROVING AUTOMATIC WORD CATEGORIZATION Korkmaz, Emin Erkan MS., Department of Computer Engineering Supervisor: Ass. Prof. Dr. G okt urk  U coluk September 1997, 57 pages In this thesis study a new approach to automatic word categorization which improves both the e ciency of the algorithm and the quality of the formed clusters is presented. The unigram and the bigram statistics of a corpus of about two million words are used with an e cient distance function to measure the similarities of words, and a greedy algorithm to put the words into clusters. The notions of fuzzy clustering like cluster prototypes, degree of membership are used to form up the clusters. Di erent distance metrics are analyzed using the algorithm. Empirical comparisons are made in order to support the discussions proposed for the type of distance metric that would be most suitable for measuring the similarity between linguistic elements. The algorithm is of unsupervised type and the number of clusters are determined at run-time.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Emin Erkan Korkmaz and Gg Okt Urk Uu Coluk (1997) a Method for Improving Automatic Word Categorization. a Method for Improving Automatic Word Categorization

This paper presents a new approach to automatic word categorization which improves both the eeciency of the algorithm and the quality of the formed clusters. The unigram and the bigram statistics of a corpus of about two million words are used with an eecient distance function to measure the similarities of words, and a greedy algorithm to put the words into clusters. The notions of fuzzy clust...

متن کامل

Method for Improving Automatic Word Categorization

This paper presents a new approach to automatic word categorization which improves both the efficiency of the algorithm and the quality of the formed clusters. The unigram and the bigram statistics of a corpus of about two million words are used with an efficient distance function to measure the similarities of words, and a greedy algorithm to put the words into clusters. The notions of fuzzy c...

متن کامل

Choosing a Distance Metric for Automatic Word Categorization

WORD CATEGORIZATION Emin Erkan Korkmaz G okt urk  U coluk Department of Computer Engineering Middle East Technical University Ankara-Turkey Emails: [email protected] [email protected] Abstract This paper analyzes the functionality of different distance metrics that can be used in a bottom-up unsupervised algorithm for automatic word categorization. The proposed method uses a mod...

متن کامل

Improving automatic image annotation: Approach by Bag-Of- Key Point

Automatic image annotation is to associate each image a set of keywords and describing the visual content of the image using an automatic system without any human intervention, many approaches have been proposed for the realization of such a system However, it is still inefficient in terms of semantic description of the image. Recent works show a frequent use of a special technique known as bag...

متن کامل

Automatic Selection of Reference Pages in Wikipedia for Improving Targeted Entities Disambiguation

A 59 A Knowledge-based Representation for Cross-Language Document Retrieval and Categorization Marc Franco-Salvador, Paolo Rosso and Roberto Navigli A 10170 A Probabilistic Approach to Persian Ezafe Recognition Habibollah Asghari, Heshaam Faili and Jalal Maleki A 10137 Acquiring a Dictionary of Emotion-Provoking Events Hoa Trong Vu, Graham Neubig, Sakriani Sakti, Tomoki Toda and Satoshi Nakamur...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997